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Abstract 

An algorithm is presented which, with optimal efficiency, solves the problem of uniform 
random generation of distribution functions for an n-valued random variable. 



1 Introduction 

In the general framework of Probabilistic Inference the case occurs that either for experimen- 
tal or empirical validation purposes one needs to generate unbiased collections of distribution 
functions for some discrete random variable . In this note, an algorithm is presented which 
efficiently solves the problem. 

2 The Problem 

Let X be a discrete random variable whose outcomes belong to a finite set of elementary 
events Cl, and let n indicate the cardinality of Q. Let Iq, be the totality of distribution 
functions for x. 

Problem: find an algorithm to sample Iq uniformly and independently of n. 

In the following, we shall rely on the existence of a subroutine. A, able to return series 
of pseudo random numbers uniformly distributed in the interval [0,1]. Existence of such 
subroutine is thoroughly discussed in 

Let us start by observing that In is naturally parametrized by n numbers, xi, . . . , Xn, satis- 
fying the conditions: 

0<Xi<l, Vi G {!,..., n}, (1) 



and 



n 



Y^x, = l. (2) 



=1 
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Ifj is therefore the (n — l)-simplex, S^~^, and our problem in equivalent to finding an 
algorithm for the uniform sampling of 5"""^. It may be worth reminding that the n- volume of 
the n-simplex tends to zero (super)-exponentially in n. This implies that the naive sampling 
strategy consisting in generating points within the unit n-cube, and discarding those falling 
outside S"" is virtually inapplicable - even for very small values of n. Other approaches such 
as that of generating points within the unit n-cube, and rescale them as to satisfy condition 
J27=i Xi = I are plainly wrong. 

3 Solution 

Here is the basic idea: for each sample point to be generated on the (n — l)-simplex, and for 
each of its first n — 1 coordinates, xi, . . . , Xn-i, randomly sample interval [0, 1] according to 
a density function able - in average - to assign to each Xi "just its fair share" of the total 
amount J27=i^i — ^- does not take much to get convinced that such density function 
indeed exist for any component Xj-. it is the marginal distribution of Xj over the simplex 
gn-i^ given the outcomes of xi, . . . , Xj^i. 

The proposed algorithm therefore runs as it follows: 

1 set ri = 1; 

2 set j = 1; 

3 until j = n — 1 

3.1 randomly extract Xj from [0, rj] according to the marginal distribution of xj over 
the simplex 5""^, given outcomes xi = xi, . . . ,Xj-i = xj-i, that is according to: 

ijj{x) = Prob {xj = X \ xi = xi, X2 = X2, • • • , xj^i = Xj-i); (3) 

3.2 set rj^i = rj — xj; 

3.3 set j = j + 1; 

4 set Xrfi — r^, 

5 output (xi, . . . , x„). 

Step 3.1 is the crucial one. To perform it, we need to: (1) determine ip for any n and any 
set of outcomes, xi, . . . ,Xj_i; (2) sample interval [0,rj] according to Tp. 

3.1 Determining ^(x) 

Let us start by observing that the marginal distribution of xi must be proportional to the 
(n — 2)-volume of the subset of defined by X2 + X3 + . . . + x„ = 1 — xi . Let us indicate such 
subset with Sxi- For any xi, Sxi is just a rescaling of the (n — 2)-simplex, and its volume 
is therefore proportional to (1 — xi)""^ (see Fig. ||a). The marginal distribution of xi can 
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Figure 1: (a) The 3- dimensional case: for any outcome of xi rescaling of the 1-simplex is 
determined. The marginal distribution of xi is therefore proportional to the 1-volume of . 
(b) 5000 samples of S'^ as obtained by applying the proposed algorithm. 

therefore be written in the form tpixi) = a(l — xi)""^, where factor a is determined via the 
normahzation condition: 

a [^{l-xi^-^dxi = 1. (4) 

^0 

This yields a = n — 1, and then: 

i^{xi) = {n-l)il-xir-\ (5) 

The same process can be iterated for all the other components xj, 1 < j < n, accounting 
for the fact that any Xj is now to be limited to the range [0, rj]. This implies that ip{xj) 
must be proportional to {rj — Xj)""^, and it is an interesting fact that dependence of V'(^j) 
from outcomes Xi, . . . , %-i is just contained in their sum 1 — rj. Thus, we can write: 

P rirj-xjr-'dxj = l, (6) 

^0 

which yields (3 = ^7^, and, finally 

i^{xj) = ^{i-xjr-'. (7) 
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The cumulative function of the marginal distribution of xj is then: 



^(Xj) = 1 - 




3.2 Sampling [0, s] according to ipi^x) 

As it is well know sampling a random variable x according to a given distribution function, 
ip{x), is readily obtained once that the inverse of the cumulative function of x, is known. 
Indeed, Eq. ^, guarantees that, for any n and j: 

^-'(0 = ^j-[l-(l-0^]- (9) 

4 Efficiency 

The proposed algorithm is optimally efficient: in dimension n it requires just n — 1 runs of 
subroutine A, plus n — 2 calls of function whose complexity is constant in n. 
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